Identification of Retrotransposons using Transcriptome Data to Analyze Sugarcane Genomes and Chromosome Differentiation

 

Xiaoyang Wu1, Dan Chen1, Xin Lu2, Wenjie Long1, Xiongmei Ying2, Guoyan Zhou1, Juan Du1, Shaoyun Wu1 and Qing Cai1*

1Biotechnology and Germplasm Resources Institute, Yunnan Academy of Agricultural Sciences, Kunming 650205, People’s Republic of China

2Sugarcane Research Institute, Yunnan Academy of Agricultural Sciences, Kaiyuan 661699, People’s Republic of China

*Correspondence author: caiqingysri@163.com

Received 09 April 2021; Accepted 17 August 2021; Published 15 November 2021

 

Abstract

 

Sugarcane belongs to allopolyploid and is an important species for polyploid research. Retrotransposons are an important part of plant genome, however the composition of retrotransposons in sugarcane genome and their effects are not very clear. In this study, six selfing progenies of Yunzhe 0786 were used as research materials, and the buds of their nodes were used for transcriptome sequencing. More than 80 GB of sequence data were obtained. A total of 97106 unigenes were obtained by sequence assembly without a reference genome. Unigenes were annotated based on the related proteins of retrotransposons, and much sequence information was obtained. One of the unigenes containing nested transposon loci was attracted our attention. Sequence structure analysis showed that this locus was a Line transposon inserted into an LTR transposon. The LTR transposon belongs to the Ty3-Gypsy type, and the Line transposon belongs to the Sof-RTE type. Sof-RTE transposons are a type of transposon with a high copy number in the sugarcane genome. Through analysis of the distribution of Sof-RTE transposons in the genomes of S. spontaneum and S. officinarum, it was found that the distribution of these transposons was site-specific in the genome and chromosome, and there were structural differences among chromosomes and genomes. Therefore, we believe that the activities of transposons such as Sof-RTE resulted in the differentiation of sugarcane genomes and chromosomes after polyploidization to a certain extent. © 2021 Friends Science Publishers

 

Keywords: S. spontaneum; S. officinarum; Genome; Retrotransposon; Differentiation

 


Introduction

 

Modern cultivated sugarcane (Saccharum hybrids spp.) is a perennial crop with aneuploidy and polyploidy. It is usually asexually propagated in production and the most important source of sugar in the world, accounting for 80% of the world’s total sugar production (Santchurna et al. 2014). The botanical classification of sugarcane is Poaceae, Panicoideae, Andropogoneae, Saccharinae and Saccharum L. (Fang et al. 2014). classified sugarcane into wild species and cultivated species. The wild species were further divided into S. spontaneum and S. robustum. The cultivated species were divided into S. sinense, S. edule, S. bareri and S. officinarum (Yang et al. 2014). Some studies suggest that S. officinarum evolved or was domesticated from S. robustum (Brandes 1956; D’Hont et al. 1993, 1998; Racedo et al. 2016).

S. spontaneum, S. robustum and S. officinarum are homologous polyploids, while S. sinense, S. edule, S. barberi and modern sugarcane are interspecific allopolyploid hybrids. Among modern sugarcane, S. officinarum and S. spontaneum contribute most of the genome components, and ~70 to 80% of the genes come from S. officinarum, ~10 to 20% from S. spontaneum and 10% from recombinant chromosomes (Hoarau et al. 2002; Ming et al. 2006; Nishiyama et al. 2014). The ploidy level of the sugarcane genome varies from 5 to 16× and the genome size is ~10 Gb. The sugarcane genome is complex and contains 812 homologous aneuploid chromosomes. By determining the content of nuclear DNA in a large number of sugarcane samples from different varieties and ploidies, it was found that the genome size of S. officinarum was estimated to be ~7.50–8.55 Gb, while that for S. robustum was ~7.56–11.78 Gb, and that for S. spontaneum was ~3.36–12.64 Gb (Zhang et al. 2012). The genome size of sugarcane from different sources depends on the composition of chromosomes. Sugarcane genome composition makes it not only an important cash crop, but also an important plant for genetic and polyploid genome phylogenetic research in plants.

Polyploidy exists widely in nature and is an important driving force for the evolution of eukaryotes (Otto 2007); about 30 ~ 70% of angiosperms are considered to have polyploid ancestors (Masterson 1994). After polyploid formation, it will be further diploidized, its genome and chromosome will differentiate, and its diploid genetic behavior will be restored to maintain genetic stability (Doyle et al. 2008; Leitch and Leitch 2008). Transposon activity is considered to be one of the main reasons for diploidization (Feldman and Levy 2012).

Transposons are important genetic elements that can jump between or within chromosomes in the genome (i.e., jumping genes). Genomes of gramineous crops such as rice (Yu et al. 2002), maize (Schnable et al. 2009) and wheat (IWGSC 2014) have been sequenced. Transposons are one of the main components of these genomes. The classification of transposons can be divided into DNA transposons and RNA transposons (Wicker et al. 2007; Bourque et al. 2018). RNA transposons are further divided into long terminal repeat (LTR) retrotransposons and non-LTR retrotransposons. LTR retrotransposons can be further classified into Ty1-Copia and Ty3-Gypsy types, while non-LTR retrotransposons can be further classified into Line and Sine types (Kumar and Bennetzen 1999). Line retrotransposons usually have two large reading frames with a polyA tail at its 3’-end. Transposons with target site repeat (TSD) structure are generally considered as complete transposons, while transposons with autonomous transposon capability are considered as full-length transposons.

At present, the genome of AP85441 an S. spontaneum line has been sequenced (Zhang et al. 2018), which provides basic data for the study of sugarcane genome composition. Ap85441 is an autotetraploid with a chromosome base of eight. It is composed of four sets of genomes and contains 32 pairs of chromosomes.

Yunzhe0786 is a new sugarcane germplasm that is generated from crosses between wild species and varieties (Chen et al. 2020). Its complex pedigree can be tracked back to S. spontaneum, S. officinarum and S. robustum. Transcriptome sequencing is a commonly used and simplified genome sequencing technology that can be applied to complex genome research. In this study, selfing progenies of Yunzhe0786 were detected by transcriptome sequencing, and retrotransposons were identified using these transcriptome data. This annotation information was used to further study the genome and chromosome differentiation of sugarcane.

 

Materials and Methods

 

Plant material

 

Plant materials (sugar cane genotypes Yunzhe0786) were provided by the National Germplasm Repository of Sugarcane, P.R. China (Kaiyuan city, Yunnan province). Yunzhe07-86 is a new sugarcane genotype that generated from the crosses of wild species and varieties. Its complex pedigree can be tracked back to S. spontaneum, S. officinarum and S. robustum. Six selfing progenies of Yunzhe0786, sister lines12137, 12139, 12149, 12150, 12171 and 12174 were separately used to transcriptome sequencingand the axillary buds formed in adult stage were selected. The axillary bud tissues were sampled from the cane nodes as experimental samples.

 

Transcriptome sequencing and sequence assembly

 

Transcriptome sequencing was performed on samples of axillary bud tissues at the adult plant stage; these samples were used for RNA extraction, reverse transcription and the construction of a DNA library. After the library was qualified, the bidirectional sequencing of the cDNA library was performed using the Illumina HiSeq high-throughput sequencing platform. The reading length was 150-bp. Raw data were filtered to obtain high-quality clean data by removing the connection sequence and low-quality reads. Clean data were used for sequence assembly without a reference genome, and Trinity software (Grabherr et al. 2011) was used to perform this work.

 

Annotation of retrotransposons in unigene

 

The unigene sequences obtained by sequence assembly were annotated for retrotransposons. The related protein sequences of conserved LTR transposon domains were used for annotation of LTR type retrotransposons (Neumann et al. 2019). Line retrotransposons were annotated using 236 related protein sequences of the conserved Line transposon domains in maize genome (Supplementary 1). BlastX, tBlastn and CDD (Marchler-Bauer et al. 2017) in NCBI were used to analyse conserved domains of transposons. ORFfinder in NCBI was used to analyse the open reading frame (ORF). Clustal X (1.8) software was used for sequence alignment (Thompson et al. 1994). MEGA 5.0 (Kumar et al. 1994) was used for phylogenetic tree construction.

 

Cloning of Line transposons

 

Genomic DNA was extracted from young leaves and used for transposon cloning (Doyle and Doyle 1990). Based on the identification of the LTR transposon sequence in the AP85441 genome, molecular markers were developed to clone the full-length sequence of the Line transposon at the nested transposon site. Molecular markers were designed online using Primer3 (http://bioinfo.ut.ee/primer3-0.4.0/primer3/) (Rozen and Skaletsky 2000). Cloning was performed with two fragments and two pairs of primers, F1 5’-TCCAGAGTTTCCAGGGAGTG-3’, and R1 5’- TCTTCCTCTCGCGGATTCTA -3’; F2 5’- AAAGGTCAGGCGTATGATGG -3’, and R2 5’- TGGTTCAAGATGCAGACCAG -3’.

The PCR system was performed in a final volume of 50 μL containing 50 ng template DNA, 1 U Taq polymerase, 1× PCR buffer, 0.2 mM dNTPs, and 0.2 μM of each primer. PCR was performed under the following conditions: 5 min at 94°C, 35 cycles of 60 s at 94°C, 60 s at annealing temperature (Tm), and 60 s at 72°C, and a final extension of 5 min at 72°C. The PCR products were then subjected to 1% agarose electrophoresis, cloning, and sequencing. The TA cloning vector was PMD19-T (TaKaRa, Japan). The cloned vector was sequenced using a 3730 DNA Analyzer (Applied Biosystems, America).

 

Identification of retrotransposons in the sugarcane genome

 

The genome sequence of S. spontaneum line AP85-441 (Zhang et al. 2018) with perfect assembly and its gff3 annotation file were downloaded from http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/ and used to identify retrotransposons. Ap85-441 genome was assembled into 32 chromosomes. The transposon sequence was obtained and its chromosome location information was recorded at the same time.

For the identification of LTR retrotransposons, the annotated LTR retrotransposon-related unigene sequence was located in the reference genome by BLASTn; 10-kb sequences from upstream and downstream of this locus were extracted; the LTRs structure of LTR transposon was identified from the extracted sequence; and the TSD structure of the retrotransposon was further identified after its LTR structure was determined.

The original genome sequence of S. officinarum line Gp0114240 (sequence information is shown in Supplementary Table 1) was also used to identify retrotransposons. BLASTn (version 2.2.23+; Altschul 1997) was used for sequence blasting against the genome sequence. Blat was used for reads mapping to reference sequence. Perl scripts and regular expressions were used to identify transposons. Line retrotransposons were identified by describing their structure suing regular expressions, with a fragment size from 150 to 4000-bp; a 3’-terminal sequence end of (TTG) n and a 10–20 bp TSD structure. Perl scripts were used to extract sequences with the structural features from the genome sequence.

 

Results

 

Annotation of retrotransposons in unigenes

 

Through transcriptome sequencing, 84.78-Gb of clean data was obtained after quality control (Table 1). A total of 97106 unigenes were obtained after sequence assembly without a reference genome. By annotating retrotransposons, a series of unigenes related to retrotransposons were found. One of the unigenes (unigene number is c108043) with a special structure attracted our attention. This sequence has both LTR and Line retrotransposon domains in sequence. This unigene originated from a nested transposon site in the sugarcane genome. According to the order of conserved domains on the sequence, we know that this site is a Line retrotransposon inserted into the interior of an LTR retrotransposon (Fig. 1). This locus was located in the S. officinarum genome by BLASTn.

 

Cloning of Line retrotransposons

 

To understand the sequence characteristics of this nested transposon locus in the sugarcane genome, we first annotated the full-length sequence of this LTR retrotransposon in the AP85-441 genome using the partial sequence of the LTR retrotransposon in c108043. As a result, we obtained three full-length LTR retrotransposon sequences from the AP85-441 genome (Supplementary 2), suggesting that this is an LTR retrotransposon family with low copy number in sugarcane genome. This type of LTR retrotransposon is about ~14 Kb in length; its LTR is ~1.6 Kb in length; and it belongs to Ty3-Gypsy type; the ends of the LTR structure have a 4-bp terminal inverted repeats (TIR) structure; which usually has a 5-bp TSD structure, but the TSD sequence is not conserved.

The identification of LTR retrotransposons provides a sequence reference for target sites of Line retrotransposons. We designed primers and cloned the full-length Line retrotransposon at this site. This Line retrotransposon is 3384-bp in length; its TSD sequence is 5’-TGATGTCCCTTATCT-3’; its 3’-terminal ends is 11 5’-TTG-3’; and its sequence contains only one open reading frame; there are two conserved domains in its open reading frame, exonuclease / endonuclease / phosphatase (EEP, cd09076) and reverse transcriptase (RT, cd01650).

This Line retrotransposon belongs to the Sof-RTE family. Gao et al. (2017) first discovered this type of transposon in the Table 1: The transcriptome sequencing data obtained in this study

 

Samples

Read number

Base number

GC content (%)

% Q30 (%)

12-137

37,021,591

10,898,254,820

54.25

93.36

12-139

47,205,821

13,917,742,462

54.93

93.06

12-149

44,804,301

13,180,928,498

55.33

92.76

12-150

45,589,814

13,409,064,880

57.01

92.86

12-171

45,425,006

13,465,860,422

55.08

93.03

12-174

67,460,254

19,906,175,042

55.81

92.98

 

 

Fig 1: Nested transposon loci in the S. officinarum genome

At this locus, a Line retrotransposon was inserted into an LTR retrotransposons. Both transposons have a complete TSD structures. The LTR retrotransposons has two reading frames, Gag and Pol, Gag has a conserved domain CP (capsid-like proteins), and Pol has four conserved domains PR (protease), RT (reverse transcriptase), RNASEH (RNase-H) and INT (integrase). According to the order of conserved domains, the LTR retrotransposons belongs to the Ty3-Gypsy type. The Line transposon has only one reading frame Pol, Pol has two conserved domains EEP and RT, and its 3’-terminal ends has a poly 5’-TTG-3’ tail

 

 

Fig 2: Frequency distribution of Sof-RTE on different chromosomes of the S. spontaneum genome

In the figure, 1-8 represent the 8 homologous groups of S. spontaneum; A to D represent the A to D chromosomes of S. spontaneum

 

Arachis duranensis genome and annotated a family member Sof-RTE (KF184817, 9024193611) in the BAC sequence of the sugarcane genome. Comparing with the works of Gao et al. (2017), we found that the transposon we cloned was a full-length Sof-RTE transposon. This is the first Sof-RTE obtained in this study, which we called Sof-RTE-ck.

 

Annotation of Sof-RTE in the sugarcane Genome

 

Annotation of Sof-RTE in the S. officinarum genome was performed using the special structure (TTG) n at the 3’-terminus of Sof-RTE to isolate sequences containing the target site. After eliminating duplicates, 64,080 target site of Sof-RTE in the S. officinarum genome were obtained (Supplementary 4). Thus, Sof-RTE is a high-copy transposon in the sugarcane genome.

By annotating Sof-RTE in the S. spontaneum genome, based on the structural characteristics of the Sof-RTE transposon, 2369 sequences were identified (Fig. 2 and Supplementary. 5). By comparison with Line-related proteins, 1655 of these have Line protein coding sequences. Some of them are 5’-terminal truncated Sof-RTE transposons. For example, S. spontaneum-Sof-RTE-1731, located on chromosome 5B, has a length of 1592 bp, a TSD sequence of 5’-TCAATAGAAGGAAGA-3’, a 3’-terminus of six 5’-TTG-3’, and a reading frame that covers only part of Sof-RTE-ck (Fig. 3. and Supplementary Fig. 1).

All 1623 sequences had more than 95% similarity with Sof-RTE-ck. In addition, there was a certain proportion of sequences that had less than 80% similarity with Sof-RTE-ck. However, they still have the structural characteristics of Sof-RTE. By generating a phylogenetic tree, the Sof-RTE transposon can be further divided into two subfamilies (Fig. 4).

 

Table 2: Homologous loci between the S. officinarum and S. spontaneum genomes

 

Number

S. officinarum

Chr in S. spontaneum

Chromosome segment

1

S. officinarum-Sof-RTE-19860

Chr1A

106111357-106111258

2

S. officinarum-Sof-RTE-27843

Chr1B

6581776-6581875

3

S. officinarum-Sof-RTE-25564

Chr1B

24632359-24632260

4

S. officinarum-Sof-RTE-40464

Chr1C

22428752-22428653

5

S. officinarum-Sof-RTE-1843

Chr1C

74175599-74175500

6

S. officinarum-Sof-RTE-2335

Chr1D

1131556-1131655

7

S. officinarum-Sof-RTE-41574

Chr1D

89731189-89731288

8

S. officinarum-Sof-RTE-41574

Chr1D

89826842-89826941

9

S. officinarum-Sof-RTE-41574

Chr1D

91626648-91626549

10

S. officinarum-Sof-RTE-22346

Chr1D

104944989-104944890

11

S. officinarum-Sof-RTE-45139

Chr2A

10739590-10739491

12

S. officinarum-Sof-RTE-56705

Chr2A

16179220-16179121

13

S. officinarum-Sof-RTE-28984

Chr2A

121313131-121313230

14

S. officinarum-Sof-RTE-22078

Chr2B

5414006-5413907

15

S. officinarum-Sof-RTE-21186

Chr2B

9585567-9585666

16

S. officinarum-Sof-RTE-9661

Chr2B

10814620-10814719

17

S. officinarum-Sof-RTE-61228

Chr2B

13334082-13334181

18

S. officinarum-Sof-RTE-61228

Chr2B

13520903-13520804

19

S. officinarum-Sof-RTE-9661

Chr2B

51892192-51892093

20

S. officinarum-Sof-RTE-50243

Chr2B

88691390-88691489

21

S. officinarum-Sof-RTE-13155

Chr2C

8228582-8228483

22

S. officinarum-Sof-RTE-13155

Chr2C

8372924-8372825

23

S. officinarum-Sof-RTE-23057

Chr2C

10441229-10441130

24

S. officinarum-Sof-RTE-28726

Chr2C

12742638-12742539

25

S. officinarum-Sof-RTE-57667

Chr2C

16654576-16654675

26

S. officinarum-Sof-RTE-24068

Chr2C

107025007-107024908

27

S. officinarum-Sof-RTE-15485

Chr2D

4580685-4580586

28

S. officinarum-Sof-RTE-15485

Chr2D

4610527-4610626

29

S. officinarum-Sof-RTE-19601

Chr2D

15849006-15848907

30

S. officinarum-Sof-RTE-45430

Chr2D

83149575-83149674

31

S. officinarum-Sof-RTE-31537

Chr3B

1924081-1924180

32

S. officinarum-Sof-RTE-31537

Chr3B

1934881-1934980

33

S. officinarum-Sof-RTE-53554

Chr3B

7307962-7307863

34

S. officinarum-Sof-RTE-19629

Chr3B

15638469-15638568

35

S. officinarum-Sof-RTE-52066

Chr3D

2428956-2429055

36

S. officinarum-Sof-RTE-47415

Chr3D

54403261-54403360

37

S. officinarum-Sof-RTE-33464

Chr4B

67307808-67307709

38

S. officinarum-Sof-RTE-54217

Chr4C

5900120-5900219

39

S. officinarum-Sof-RTE-21052

Chr4C

47177877-47177976

40

S. officinarum-Sof-RTE-47876

Chr5A

46390937-46391036

41

S. officinarum-Sof-RTE-54664

Chr5B

3387867-3387768

42

S. officinarum-Sof-RTE-16803

Chr5B

45153628-45153529

43

S. officinarum-Sof-RTE-16803

Chr5B

45289674-45289575

44

S. officinarum-Sof-RTE-48465

Chr5B

46874481-46874580

45

S. officinarum-Sof-RTE-29906

Chr5B

47006227-47006128

46

S. officinarum-Sof-RTE-17535

Chr5C

63744421-63744322

47

S. officinarum-Sof-RTE-17535

Chr5C

63775422-63775323

48

S. officinarum-Sof-RTE-20752

Chr5D

2730626-2730527

49

S. officinarum-Sof-RTE-20752

Chr5D

2836968-2836869

50

S. officinarum-Sof-RTE-17942

Chr6B

59754200-59754101

51

S. officinarum-Sof-RTE-44844

Chr7B

3704118-3704217

52

S. officinarum-Sof-RTE-51883

Chr7D

7777073-7776974

53

S. officinarum-Sof-RTE-33508

Chr7D

14334279-14334180

54

S. officinarum-Sof-RTE-50246

Chr8D

4054789-4054888

55

S. officinarum-Sof-RTE-25411

Chr8D

7002287-7002386

 

 

Distribution of Sof-RTE in the sugarcane genome

 

Analysis of homologous loci in the S. officinarum and S. spontaneum genomes, revealed 64080 target site sequences of Sof-RTE in the S. officinarum genome that were used to blasted against the S. spontaneum genome sequence. Only 55 common loci were obtained (Table 2), therefore, the distribution of Sof-RTE transposon has a strong genome specificity.

The chromosome location of each Sof-RTE transposon is shown in Supplementary 6. According to the distribution of Sof-RTE acrcoss different chromosomes of the S. spontaneum genome, there was no linear relationship between homologous chromosomes, that is to say, there was no common target site of Sof-RTE on homologous chromosomes (Fig. 5). It was concluded that the distribution of Sof-RTE in the sugarcane genome was the intergenic region, to a certain extent, which resulted in the differentiation of homologous chromosomes in the sugarcane genome.

 

Discussion

 

With the development of high-throughput sequencing technology, the cost of sequencing is decreasing, making this technology became a common method in genome research. In the study of complex genomes, some steps are usually performed before sequencing to remove unwanted DNA information. ChIP-seq is a technology combining chromatin immunoprecipitation (ChIP) with high throughput sequencing. After enrichment for DNA sequences bound to a specific protein, the library is constructed and sequenced. Zhang et al. (2017) obtained a large number of retrotransposon sequences using this sequencing technique, and successfully developed sugarcane centromere site-specific fluorescence in situ hybridization (FISH) probes using these sequences. To facilitate transposon sequence assembly without a reference genome, we used transcriptome sequencing to study the sugarcane genome and obtained 97106 unigenes of sugarcane. A large amount of retrotransposon information remains in these unigenes.

Sof-RTE is a member of the RTE clade in the Line family, and RTE is a type of widespread horizontal transposon that has been transferred across plant and animal genomes (Gao et al. 2017). Typical Line transposons usually have two open reading frames and a polyA tail at the 3’-end (Kumar and Bennetzen 1999). However, the end of the Sof-RTE is a poly 5’-TTG-3’ sequence, with only one open reading frame in its sequence, and the known full-length sequence of Sof-RTE is approximately 3-kb. Cin4 is a type of Line transposon found in the maize genome; its full-length sequence is above 7 kb; and there are 5’-terminal truncated family members (Schwarz-Sommer et al. 1987). The 5’-terminal truncated Cin4 was also found to be inserted in the waxy gene (Wu et al. 2017). The transposon Sof-RTE shows the same pattern. We believe that Sof-RTE has completed its jump in the genome, resulting in its 5’-terminal truncation and the 5’-terminal truncated Sof-RTE still contains the complete transposon structure.

Polyploidization plays an important role in the genome evolution of eukaryotic organisms, especially plants. The sugarcane genome is very similar to that of sorghum (Sorghum bicolor (L.) Moench). Sorghum is a diploid plant. Before Eight million years ago, sugarcane and sorghum shared a common ancestor (Wang et al. 2010). Sugarcane ancestors subsequently formed octoploids by double polyploidization of the S. robustum and S. spontaneum genomes (D’Hont et al. 1998; Ha et al. 1999; Aitken et al. 2014). It is generally accepted that most plants have one or two rounds of genome-wide duplication after diploidization, and the cycle of polyploidization and diploidization forms a stable genome at present (Comai 2005; Jiao et al. 2011, 2014). In this study, we found that there were few common transposon target sites between the S. spontaneum and S. officinarum genomes, which indicates that a large amount of transposon activity occurred in the independent evolution stage of the two genomes. Perhaps it is the activity of transposons such as Sof-RTE that results in the differentiation of genomes.

 

Fig 3: Structural comparison of full-length and 5’-terminal truncated Sof-RTE

Taking Sof-RTE-ck and S. spontaneum-Sof-RTE-1731 as examples, the full-length and 5’-terminal truncated Sof-RTE transposons were compared. Judging from the transposons with complete TSD structures, both Sof-RTE-ck and S. spontaneum-Sof-RTE-1731 are complete transposons. While S. spontaneum-Sof-RTE-1731 is a 5’-terminal truncated transposon compared to full-length transposons such as Sof-RTE-ck

 

 

Fig 4: Phylogenetic tree of Sof-RTE in the S. spontaneum genome

Twenty Sof-RTE sequences were used to construct the phylogenetic tree. There are transposons with the same conserved domains but low sequence similarity with Sof-RTE-CK in the S. spontaneum genome. By establishing phylogenetic trees, these sequences can be divided into two subclasses, A and B

 

 

Fig 5: Transposon activity causes homologous chromosome differentiation

In the figure, 1A, 1B, 1C and 1D represent four homologous chromosomes of the sugarcane genome; 1-5 indicates the linear gene loci among the homologous chromosomes; and TE1-TE4 indicates transposon insertion sites. There is no linear relationship among the insertion sites of transposons across homologous chromosomes, so it can be assumed that the activity of transposons causes the differentiation of homologous chromosomes

 

Chromosome pairing during meiosis can be observed and shows the homology and evolution of chromosomes. Although the sugarcane genome is polyploid, meiosis mainly occurs through bivalent pairing, while trivalent and monovalent pairing is rare (Bielig et al. 2003). Diploidization after polyploidization has resulted in the existing chromosome behavior in sugarcane. Transposon activity provides an effective pathway for diploidization through chromosome rearrangement (Feldman and Levy 2012). Transposon activity can lead to the expansion and reduction of genome capacity (Schubert and Vu 2016). Transposon activity is usually not so accurate, which causes the change of its target site sequences (Jiang et al. 2004). Even transposons that lose transposable ability still have the potentiality to change chromosome structure (Carvalho and Lupski 2016). Recombination events can occur between homologous regions, even if these are scattered far away in same or different chromosomes, resulting in large-scale deletion, duplication or inversion (Bennetzen and Wang 2014). Transposons also provide micro homologous regions, which predispose to template switching during repair of replication, resulting in chromosome structural variants (Lee et al. 2007). In this study, we found that distribution of the Sof-RTE transposon has a strong genome specificity and that there is no linear relationship between homologous chromosomes. We believe that transposon like Sof-RTE activity is responsible for the differentiation of homologous chromosomes and further results in the diploidization of chromosomal behavior.

Conclusion

 

In this study, we found the Line transposon Sof-RTE through the annotation of sugarcane unigenes. Sof-RTE transposons are high copy number transposons in the sugarcane genome. These transposons were site-specific in the genome and chromosome. The activity of Sof-RTE transposons in the genome results in the structural differences among homologous chromosomes and genomes of sugarcane. We believe that the transposons activity similar to that of Sof-RTE is one of the main driving forces of chromosome and genome differentiation.

 

Acknowledgements

 

This study was supported by the National Natural Science Foundation of China (31560416), the Post-doctoral Targeted Funding of Yunnan Province (Yunren Shetong [2018] 168) and the Yunnan Provincial Science and Technology Department Plan (2016FB067).

 

Conflicts of Interest

 

The authors declare no competing interests.

 

Data Availability

 

Data presented in this study will be available upon reasonable request by the corresponding author.

 

Ethics Approval

 

Not applicable to this paper.

 

References

 

Aitken KS, MD McNeil, PJ Berkman, S Hermann, A Kilian, PC Bundock, JC Li (2014). Comparative mapping in the Poaceae family reveals translocations in the complex polyploid genome of sugarcane. BMC Plant Biol 14; Article 190

Altschul SF, LM Thomas, AS Alejandro, JH Zhang, Z Zhang, M Webb, JL David (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:33893402

Bennetzen JL, H Wang (2014). The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu Rev Plant Biol 65:505–530

Bielig ML, A Mariani, N Berding (2003). Cytological studies of 2n male gamete formation in sugarcane, Saccharum L. Euphytica 133:137151

Bourque G, KH Burns, M Gehring, V Gorbunova, A Seluanov, M Hammell, M Imbeault, Z Izsvák, HL Levin, TS Macfarlan, DL Mager, C Feschotte (2018). Ten things you should know about transposable elements. Genome Biol 19; Article 199

Brandes EW (1956). Origin, dispersal and use in breeding of the Melanesian Garden sugarcanes and their derivatives, Saccharum officinarum L. Proc Intl Soc Suagr Cane Technol 9:709750

Carvalho CM, JR Lupski (2016). Mechanisms underlying structural variant formation in genomic disorders. Nat Rev Genet 17:224–238

Chen D, X Lu, XY Wu, XM Ying, WJ Long, HS Su, HB Liu, XQ Lin, CH Xu, Q Cai (2020). Transcriptome analysis of axillary bud differentiation in a new dual-axillary bud genotype of sugarcane. Genet Resour Crop Evol 67:685–701

Comai L (2005). The advantages and disadvantages of being polyploidy. Nat Rev Genet 6:836–846

D’Hont A, D Ison, K Alix, C Roux, JC Glaszmann (1998). Determination of basic chromosome numbers in the genus Saccharum by physical mapping of ribosomal RNA genes. Genome 41:221225

DHont A, YH Lu, P Feldmann, JC Glaszmann (1993). Cytoplasmic diversity in sugarcane revealed by heterologous probes. Sugar Cane 1:12–15

Doyle JJ, LE Flage, AH Paterson, RA Rapp, DE Soltis, PS Soltis, JF Wendel (2008). Evolutionary genetics of genome merger and doubling in plants. Annu Rev Genet 42:443–461

Doyle JJ, JL Doyle (1990). Isolation of plant DNA from fresh tissue. Focus 12:1315

Fang JP, YX Que, RK Chen (2014). A Review of Saccharum Origin and its Evolutionary Relationship with Related Genera. Chin J Trop Crops 35:816822 (In Chinese)

Feldman M, AA Levy (2012). Genome evolution due to allopolyploidization in wheat. Genetics 192:763774

Gao DY, Y Chu, H Xia, CM Xu, K Heyduk, B Abernathy, P Ozias-Akins, HJ Leebens-Mack, AS Jackson (2017). Horizontal transfer of non-LTR retrotransposons from arthropods to flowering plants. Mol Biol Evol 35:354–364

Grabherr MG, BJ Haas, M Yassour, JZ Levin, DA Thompson, I Amit, X Adiconis, L Fan, R Raychowdhury, QD Zeng, ZH Chen, E Mauceli, N Hacohen, A Gnirke, N Rhind, F Palma, BW Birren, C Nusbaum, K Lindblad-Toh, N Friedman, A Regev (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652

Ha S, HP Moore, D Heinz, S Kato, N Ohmido, K Fukui (1999). Quantitative chromosome map of the polyploid Saccharum spontaneum by multicolor fluorescence in situ hybridization and imaging methods. Plant Mol Biol 38:1165–1173

Hoarau JY, L Grivet, B Offmann, LM Raboin, JP Diorflar, J Payet, M Hellmann, A D’Hont, JC Glaszmann (2002). Genetic dissection of a modern sugarcane cultivar (Saccharum spp.). II Detection of QTLs for yield components. Theor Appl Genet 105:10271037

Jiang N, Z Bao, X Zhang, SR Eddy, SR Wessler (2004). Pack-MULE transposable elements mediate gene evolution in plants. Nature 431:569–573

Jiao YN, JP Li, HB Tang, HA Patersona (2014). Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots. Plant Cell 26:27922802

Jiao YN, JN Wickett, S Ayyampalayam, SA Chanderbali, L Landherr, EP Ralph, PL Tomsho, Y Hu, HY Liang, SP Soltis, ED Soltis, WS Clifton, ES Schlarbaum, CS Schuster, H Ma, J Leebens-Mack, WC dePamphilis (2011). Ancestral polyploidy in seed plants and angiosperms. Nature 473:97100

Kumar A, JL Bennetzen (1999). Plant retrotransposons. Annu Rev Genet 33:479–532

Kumar S, K Tamura, M Nei (1994). Mega: Molecular evolutionary genetic analysis software for microcomputers. Comput Appl Biosci 10:189191

Lee JA, CM Carvalho, JR Lupski (2007). A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131:1235–1247

Leitch A, I Leitch (2008). Genomic plasticity and the diversity of polyploid plants. Science 320:481483

Masterson J (1994). Stomatal size in fossil plants: Evidence for polyploidy in majority of angiosperms. Science 264:421424

Marchler-Bauer A, Y Bo, LY Han, JN He, JC Lanczycki, SN Lu, F Chitsaz, KM Derbyshire, CR Geer, RN Gonzales, M Gwadz, ID Hurwitz, F Lu, Marchler G H., SJ Song, N Thanki, ZX Wang, AR Yamashita, DC Zhang, CJ Zheng, YL Geer, HS Bryant (2017). CDD/SPARCLE: Functional classification of proteins via subfamily domain architectures. Nucleic Acids Res 45:D200–D203

Ming R, HP Moore, KK Wu, A D’hont, CJ Glaszmann, LT Tew, TE Mirkov, J Silva, J Jifon, M Rai, JR Schnell, MS Brumbley, P Lakshmanan, CJ Comstock, HA Paterson (2006). Sugarcane improvement through breeding and biotechnology. Plant Breed Rev 27:15118

Neumann P, P Novák, N Hoštáková, J Macas (2019). Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mobile DNA 10; Article 1

Nishiyama MYJ, SS Ferreira, PZ Tang, S Becker, A Pörtner-Taliana, GM Souza (2014). Full-length enriched cDNA libraries and ORFeome analysis of sugarcane hybrid and ancestor genotypes. PloS One 9; Article e107351

Otto SP (2007). The evolutionary consequences of polyploidy. Cell 131:452462

Racedo J, L Gutiérrez, MF Perera, Ostengo S Santiago, EM Pardo, MI Cuenya, B Welin, AP Castagnaro (2016) Genome-wide association mapping of quantitative traits in a breeding population of sugarcane. BMC Plant Biol 16; Article 142

Rozen S, H Skaletsky (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132:365386

Santchurna D, K Ramdoyala, MGH Badalooa, MT Labuschagneb (2014). From sugar industry to cane industry: Evaluation and simultaneous selection of different types of high biomass canes. Biomass Bioener 61:8292

Schnable SP, D Ware, SR Fulton, CJ Stein, FS Wei, S Pasternak, CZ Liang, JW Zhang, L Fulton, AT Graves, P Minx, AD Reily, L Courtney, SS Kruchowski, C Tomlinson, C Strong, K Delehaunty, C Fronick, B Courtney, MS Rock, E Belter, FY Du, K Kim, MR Abbott, M Cotton, A Levy, P Marchetto, K Ochoa, MS Jackson, B Gillam, WZ Chen, Yan L, J Higginbotham, M Cardenas, J Waligorski, E Applebaum, L Phelps, J Falcone, K Kanchi, T Thane, A Scimone, N Thane, J Henke, T Wang, J Ruppert, N Shah, K Rotter, J Hodges, E Ingenthron, M Cordes, S Kohlberg, J Sgro, B Delgado, K Mead, A Chinwalla, S Leonard, K Crouse, K Collura, D Kudrna, J Currie, RF He, A Angelova, S Rajasekar, T Mueller, R Lomeli, G Scara, A Ko, K Delaney, M Wissotski, G Lopez, D Campos, M Braidotti, E Ashley, W Golser, H Kim, S Lee, JK Lin, Z Dujmic, W Kim, J Talag, A Zuccolo, CZ Fan, A Sebastian, M Kramer, L Spiegel, L Nascimento, T Zutavern, B Miller, C Ambroise, S Muller, W Spooner, A Narechania, LY Ren, SR Wei, S Kumari, B Faga, JM Levy, L McMahan, PV Buren, WM Vaughn, K Ying, CT Ye, JS Emrich, Y Jia, A Kalyanaraman, AP Hsia, WB Brad, SR Baucom, PT Brutnell, CN Carpita, C Chaparro, JM Chia, JM Deragon, CJ Estill, Y Fu, AJ Jeddeloh, YJ Han, H Lee, PH Li, RD Lisch, SZ Liu, ZJ Liu, DH Nagel, CM McCann, P SanMiguel, MA Myers, D Nettleton, J Nguyen, WB Penning, L Ponnala, LK Schneider, CD Schwartz, A Sharma, C Soderlund, MN Springer, Q Sun, H Wang, M Waterman, R Westerman, KT Wolfgruber, LX Yang, YS Yu, LF Zhang, SG Zhou, QH Zhu, LJ Bennetzen, RD Kelly, JM Jiang, N Jiang, GG Presting, RS Wessler, S Aluru, AR Martienssen, WS Clifton, WM Richard, AR Wing, KR Wilson (2009). The B73 maize genome: Complexity, diversity, and dynamics. Science 326:1112–1115

Schubert I, GTH Vu (2016). Genome stability and evolution: Attempting a holistic view. Trends Plant Sci 21:749–757

Schwarz-Sommer Z, L Leclercq, E Göbel, H Saedler (1987). Cin4, an insert altering the structure of the Al gene in Zea mays, exhibits properties of nonviral retrotransposons. EMBO J 6:3873–3880

The International Wheat Genome Sequencing Consortium (IWGSC) (2014). A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345:114


Thompson DJ, GD Higgins, JT Gibson (1994). CLUSTALW: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res 22:4673–4680

Wang JP, B Roe, S Macmil, QY Yu, JE Murray, HB Tang, CX Chen, F Najar, G Wiley, J Bowers, MV Sluys, DS Rokhsar, ME Hudson, SP Moose, AH Paterson, R Ming (2010). Microcollinearity between autopolyploid sugarcane and diploid sorghum genomes. BMC Genomics 11; Article 261

Wicker T, R Sabot, A Hua-Van, LJ Bennetzen, P Capy, B Chalhoub, A Flavell, P Leroy, M Morgante, O Panaud, E Paux, P SanMiguel, HA Schulman (2007). A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982

Wu XY, D Chen, YQ Lu, WH Liu, XM Yang, XQ Li, J Du, LH Li (2017). Molecular characteristics of two new waxy mutations in China waxy maize. Mol Breed 37:17

Yang CF, LT Yang, YR Li (2014) Origins and evolution of sugarcane. J South Agric 45:17441750 (In Chinese)

Yu J, SN Hu, J Wong GKS Wang, SG Li, B Liu, YJ Deng, L Dai, Y Zhou, XQ Zhang, ML Cao, J Liu, JD Sun, JB Tang, YJ Chen, XB Huang, W Lin, C Ye, W Tong, LJ Cong, JN Geng, YJ Han, L Li, W Li, GQ Hu, XG Huang, WJ Li, J Li, ZW Liu, L Li, JP Liu, QH Qi, JS Liu, L Li, T Li, XG Wang, H Lu, TT Wu, M Zhu, PX Ni, H Han, W Dong, XY Ren, XL Feng, P Cui, XR Li, H Wang, X Xu, WX Zhai, Z Xu, JS Zhang, SJ He, JG Zhang, JC Xu, KL Zhang, XW Zheng, JH Dong, WY Zeng, L Tao, J Ye, J Tan, XD Ren, XW Chen, J He, DF Liu, W Tian, CG Tian, HG Xia, QY Bao, G Li, H Gao, T Cao, J Wang, WM Zhao, P Li, W Chen, XD Wang, Y Zhang, JF Hu, J Wang, S Liu, J Yang, GY Zhang, YQ Xiong, ZJ Li, L Mao, CS Zhou, Z Zhu, RS Chen, BL Hao, WM Zheng, SY Chen, W Guo, GJ Li, SQ Liu, M Tao, J Wang, LH Zhu, LP Yuan, HM Yang (2002). A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica). Science 296:7992

Zhang JS, C Nagai, QY Yu, YB Pan, T Ayala-Silva, JR Schnell, CJ Comstock, KA Arumuganathan, R Ming (2012). Genome size variation in three Saccharum species. Euphytica 185:511519

Zhang JS, XT Zhang, HT Tang, Q Hua XT Zhang, XK Ma, F Zhu, T Jones, XG Zhu, J Bowers, CM Wai, CF Zheng, Y Shi, S Chen, XM Xu, JJ Yue, RD Nelson, LX Huang, Li Z Zhen, HM Xu, D Zhou, YJ Wang, WC Hu, JS Lin, YJ Deng, N Pandey, M Mancini, D Zerpa, KJ Nguyen, LM Wang, L Yu, YG Xin, LF Ge, J Arro, OJ Han, S Chakrabarty, M Pushko, WP Zhang, YH Ma, PP Ma, MJ Lv, FM Chen, GY Zheng, JS Xu, ZH Yang, F Deng, XQ Chen, ZY Liao, XX Zhang, ZC Lin, H Lin, HS Yan, Z Kuang, WM Zhong, PP Liang, GF Wang, Y Yuan, JX Shi, JX Hou, JX Lin, JJ Jin, PJ Cao, QC Shen, Q Jiang, P Zhou, YY Ma, XD Zhang, RR Xu, J Liu, YM Zhou, HF Jia, Q Ma, R Qi, ZL Zhang, JP Fang, HK Fang, JJ Song, MJ Wang, GG Dong, G Wang, Z Chen, T Ma, H Liu, RS Dhungana, ES Huss, XP Yang, A Sharma, HJ Trujillo, CM Martinez, M Hudson, JJ Riascos, M Schuler, LQ Chen, MD Braun, L Li, QY Yu, JP Wang, K Wang, CM Schatz, D Heckerman, MV Sluys, GM Souza, HP Moore, D Sankoff, R VanBuren, HA Paterson, C Nagai, R Ming (2018). Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat Genet 50:1565–1573

Zhang WP, S Zuo, ZJ Li, Z Meng, JL Han, JP Song, YB Pan, K Wang (2017). Isolation and characterization of centromeric repetitive DNA sequences in Saccharum spontaneum. Sci Rep 7; Article 41659